arXiv:1504.02870vl [stat.ML] 11 Apr 2015 


Quick sensitivity analysis for incremental data modification and 
its application to leave-one-out CV in linear classification problems 


Shota Okumura Yoshiki Suzuki 

Nagoya Institute of Technology Nagoya Institnte of Technology 
Nagoya, Japan Nagoya, Japan 

okumura.mllab.nitOgmail.com suzuki.mllab.nitOgmail.com 

Ichiro Takeuchi* 

Nagoya Institute of Technology 
Nagoya, Japan 

takeuchi.ichiro@nitech.ac.jp 
April 14, 2015 


* Corresponding author 


1 



Abstract 


We introduce a novel sensitivity analysis framework for large scale classification problems that can 
be used when a small number of instances are incrementally added or removed. For quickly updating 
the classifier in such a situation, incremental learning algorithms have been intensively studied in the 
literature. Although they are much more efficient than solving the optimization problem from scratch, 
their computational complexity yet depends on the entire training set size. It means that, if the original 
training set is large, completely solving an incremental learning problem might be still rather expensive. 
To circumvent this computational issue, we propose a novel framework that allows us to make an inference 
about the updated classifier without actually re-optimizing it. Specifically, the proposed framework can 
quickly provide a lower and an upper bounds of a quantity on the unknown updated classifier. The main 
advantage of the proposed framework is that the computational cost of computing these bounds depends 
only on the number of updated instances. This property is quite advantageous in a typical sensitivity 
analysis task where only a small number of instances are updated. In this paper we demonstrate that the 
proposed framework is applicable to various practical sensitivity analysis tasks, and the bounds provided 
by the framework are often sufficiently tight for making desired inferences. 

Incremental Learning, Sensitivity Analysis, Classification Support Vector Machines, Logistic Regres¬ 
sion, Leave-one-out Cross-validation 
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1 Introduction 


Given a large number of training instances, the initial training cost of a classifier such as logistic regression 
(LR) or support vector machine (SVM) would be quite expensive. In principle, there is no simple way around 
this initial training cost except when suboptimal approximate classifiers (e.g., trained by using random small 
sub-samples) are acceptable. Unfortunately, such an initial training cost is not the only thing we must care 
about in practice. In many practical data engineering tasks, the training set with which the initial classifier 
was trained might be slightly modified. In such a case, it is important to check the sensitivity of the classifier, 
i.e., how the results would change when the classifier is updated with the slightly modified training set. 

Machine learning algorithms particularly designed for updating a classifier when a small number of 
instances are incrementally added or removed are called incremental learning [3]. For example, when a 
single instance is added or removed, the solution of a linear predictor can be efficiently computed (see, e.g., 
0). Incremental learning algorithms for SVMs and other related learning frameworks have been intensively 
studied in the literature [aHUIlllSilTlilla]. Even for problems whose explicit incremental learning algorithm 
does not exist, warm start approach, where the original optimal solution is used as an initial starting point 
for the updating optimization problem, is usually very helpful for reducing incremental learning costs 0113]. 

However, the computational cost of incremental learning is still very expensive if the original training set 
is large. Except for special casej^, any incremental learning algorithms must go through the entire training 
data matrix at least once, meaning that the complexities depend on the entire training set size. When only a 
small number of instances are modified, spending a great amount of computational cost for re-optimizing the 
classifier does not seem to be a well worthy effort because inference results on the updated classifier would not 
be so different from the original ones. Furthermore, in practical applications, it might be computationally 
intractable to completely update the classifier every time there is a tiny modification of the training set. In 
such a situation, it would be nice if we could quickly check the sensitivity of the classifier without actually 
updating it. Unless the sensitivity is unacceptably large, we might want to use the original classifier as it is. 

Our key observation here is that the goal of sensitivity analysis is not to update the classifier itself, 
but to know how much the results of our interest would change when the classifier is updated with the 
slightly modified training set. Suppose, for example, that we have a test instance. Then, we would be 
interested in whether there is a chance that the class label of the test instance could be changed by a 
minor data modification or not. In order to answer such a question, we propose a novel approach that can 
quickly compute the sensitivity of a quantity depending on the unknown updated classifier without actually 
re-optimizing it. 

In this paper we study a class of regularized linear binary classification problems with convex loss. We 
propose a novel framework for this class of problems that can efficiently compute a lower and an upper 
bounds of a general linear score of the updated classifier. Specifically, denoting the coefficient vector of 

^ For example, in incremental learning of SVM, adding or removing an instance whose margin is greater than one can be 
done without any cost because such a modification does not change the solution. 
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the updated linear classifier as /S^ew; framework allows us to obtain a lower and an upper bounds of a 
general linear score in the form of r]^(3*^^, where rj is an arbitrary vector of the appropriate dimension. An 
advantage of our framework is that the complexity of computing the bounds depends only on the number 
of updated instances, and does not depend on the size of the entire training set. This property is quite 
advantageous in a typical sensitivity analysis where only a small number of instances are updated. 

Bounding a linear score in the form of is useful in a wide range of sensitivity analysis tasks. 

First, by setting r/ = ej, where is a vector with all 0 except 1 in the j**' position, we can obtain a 
lower and an upper bounds of each coefficient (3*^^ j = 1,... ,d, where d is the input dimension. Another 
interesting example is the case where r/ = x, where a; is a test instance of our interest. Note that, if the 
lower/upper bound of is positive/negative, then we can make sure that the test instance is classified 

as positive/negative, respectively. It means that the class label of a test instance might be available even if 
we do not know the exact value of 

To the best of our knowledge, there are no other existing studies on sensitivity analysis that can be used 
as generally as our framework. However, there are some closely-related methods designed for particular 
tasks. One such example that has been intensively studied in the literature is leave-one-out cross-validation 
(LOOCV). In each step of an LOOCV, a single instance is taken out from the original training set, and we 
check whether the left-out instance is correctly classified or not by using the updated classifier. This task 
exactly fits into our framework because we are only interested in the class label of the left-out instance, and 
the optimal updated classifier itself is not actually required. Efficient LOOCV methods have been studied 
for SVMs and other related learning methods EiiiniiiniiJ- Some of these existing methods are built on 
a similar idea as ours in the sense that the class label of a left-out test instance is efficiently determined by 
computing bounds of the linear score a;^/3*g^. The bounds obtained by our proposed framework are different 
from the bounds used in these existing LOOCV methods. We empirically show that LOOCV computation 
algorithm using our framework is much faster than existing methods. 

The bound computation technique we use here is inspired from recent studies on safe feature screening, 
which was introduced in the context of Li sparse feature modeling [5]. It allows us to identify sparse 
features whose coefficients turn out to be zero at the optimal solution. The key idea used there is to bound 
the Lagrange multipliers before actually solving the optimization problem for model fittinj^. The idea of 
bounding the optimal solution without actually solving the optimization problem has been recently extended 
to various directions 0 m HU m [mile]. Our main technical contribution in this paper is to bring this 
idea to sensitivity analysis problems and develop a novel framework for efficiently bounding general linear 
scores with the cost depending only on the number of updated instances. 

The rest of the paper is organized as follows. In we describe the problem setup and present three 
sensitivity analysis tasks that our framework can be applied to. In lj3] we present our main result which 

^ In these works, the main focus is not on computing LOOCV error itself, but on deriving a lower bound of LOOCV error. 

^ Lagrange multiplier values at the optimal solution tell us which features are active or non-active. 
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enables us to compute a lower and an upper bounds of a general linear score with the computational 

cost depending only on the number of updated instances. In addition, we apply the framework to the 
three tasks described in In 21 discuss how to tighten the bounds when the bounds provided by the 
framework are not sufficiently tight for making a desired inference. 21 is devoted for numerical experiments. 
§[6] concludes the paper and discuss a few future directions of this work. All the proofs are presented in 
Appendix 


2 Preliminaries and basic idea 

In this section we first formulate the problem setup and clarify the difference between the proposed framework 
and conventional incremental learning approaches. Then, we discuss three sensitivity analysis tasks in which 
the proposed framework is useful. 


2.1 Problem setup 


In this paper we study binary classification problems. We consider an incremental learning setup, where we 
have already trained a classifier by using a training set, and then a small number of instances are added to 
and/or removed from the original training set. The goal of conventional incremental learning problems is to 
update the classifier by re-training it with the updated training set. Hereafter, we denote the original and 
the updated training sets as {{xi, yi)} {(^ol/Olie'Dnewi respectively, where Hold and Hnew are the 
set of indices of the instances in old and new training sets with the sizes rioid := |Hoid| and rinew := \V 

new I; 

respectively. The input Xi is assumed to be d-dimensional vector and the class label yi takes either —1 or -1-1. 
We denote the set of added and removed instances as {{xi,yi)}i^_A and {{xi,yi)}i^'fi, where A C Hnew and 
TZ C Hold are the set of indices of the added and removed instances with the sizes ua '■= |A| and n/j := |72.|, 
respectively. Note that, if one wants to modify an instance in the training set, one can first remove it and 
then add the modified one. 

We consider a linear classifier in the form of 

+1 if/(a;;/3) > 0, 

-1 if/(a;;/3) < 0, 

where the classifier predicts the class label y S {—1,-|-1} for the given input a; G R'^, while /3 G R.'^ is a vector 
of classifier’s coefficients. In this paper we consider a class of problems represented as a minimization of an 
L 2 regularized convex loss. Specifically, the old and the new classifiers are defined as 

|2 


y = 


with f{x;/3) = x^(3, 


and 


/3oid := arg min — ^ /3)) + ^\\/3\\‘^ 

/3eR‘i rioid y 

^€T>old 


/3new := arg min ^ (3)) + ^\\f3f, 


( 1 ) 


( 2 ) 


ie-Dn 
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where the first and the second terms of the objective function represent an empirical loss term and an L 2 - 
regularization term, respectively, and A > 0 is a regularization parameter that controls the balance between 
these two terms. We assume that £{■, •) is differentiable and convex with respect to the second argument. 
Examples of such a loss function includes logistic regression loss 


Kvi, f3)) := log(l + exp(-?/i/(a;*; /3))), 


(3) 


and L 2 -hinge loss 


i{yi, f{xi; (3)) := max{0,1 - y^fix,; /3)}^. 


(4) 


For any i G I?oid U I>new and any (3o S we denote the gradient of the individual loss as 


V£i{f3o) := -^£{y„f{xGf3)) 


/3=/3o 


Our main interest is in the cases where the number of added instances ua and removed instances n/j are 
both much smaller than the entire training set size rioid or rinew In such a case, we expect that the difference 
between f3*^^ and /J^ew small. However, if the training set size rinew is large, solving the optimization 
problem © by using an incremental learning algorithm is still very expensive because any incremental 
learning algorithms require working through the entire training data matrix at least once, meaning that the 
complexity of such an incremental learning is at least 0{n^ewd). 

Our approach is different from conventional incremental learning. In this paper we propose a novel 
framework that enables us to make inferences about the new solution /3*ew without actually solving the 
optimization problem Q. The proposed framework can efficiently compute a lower and an upper bounds 
of, what we call, a linear score 




new 5 


(5) 


where rj G is an arbitrary vector of dimension d. An advantage of this framework is that the computational 
cost of computing these bounds depends only on the number of updated instances ua + ur and does not 
depend on the entire training set size rioid or n^ew, i-e., the complexity is 0{{nA + nR)d). This property 
is quite advantageous in a typical sensitivity analysis where rinew is much larger than ua + riR. These 
bounds are computed based on the old optimal solution (3*^^. We denote the lower and the upper bounds as 
and U{ti^(3*^^), respectively, i.e., they satisfy 

Hv^f3LJ < < U{v^f3:,^). 


The proposed framework can be kernelized for nonlinear classification problems if the inner products 
r7^/3*g^ and V£i(/3*j^) for any i G T>oid U T*new can be represented by the kernel function. 

In the following three subsections, we discuss three sensitivity analysis tasks in which the above proposed 
framework might be useful. 
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Figure 1: Examples of coefficient bounds LiPnev^j) j G [5], for an artificial toy dataset with 

noid = 1000 and d = 5. The blue, red and pink error bars indicate the bounds when ua + nn = 1 (0.1%), 5 
(0.5%), and 10 (1%), respectively. The unknown true coefficients j G [5], are indicated by X. 

2.2 Sensitivity of coefficients 

Let Bj G j G [d], be a vector of all 0 except 1 in the element. Then, by setting r/ := Bj in ([S]), we can 
compute a lower and an upper bounds of the new classifier’s coefficient = bJ (3* g^,j G [d\ such that 

i(/3Lw.,) < J e Mi- 

Figure [T] illustrates such coefficient’s bounds for a simple toy dataset. Given a lower and an upper bounds of 
the coefficients, we can, in principle, obtain the bounds of any quantities depending on (3*^^. Bounding the 
largest possible change of the new classifier’s coefficients or a quantity depending on it would be beneficial 
for making decisions in practical tasks. 

2.3 Sensitivity of class labels 

Next, let us consider sensitivity analysis of the new class label for a test instance x GMf^, i.e., we would like 
to know 

y := sgn{f{x-l3*^J) = sgn{x^f3*^J. 


By setting ry := a; in ([5]), we can compute a lower and an upper bounds such that 


< ®^/3new < U(x^ f3*gj . 

Here, it is interesting to note that, using the following simple facts: 

(6) 

L{x^f3lg^) >0 ^ y = +1, 

(7a) 

U{x^(3:,J <0 ^ y = -1, 

(7b) 
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Figure 2: Examples of test instance score bounds L(a:^/3*g^) and C/(a:^/3*g^) for 10 test instances in the 
same dataset as in Figure [TJ The blue, red and pink error bars indicate the bounds when ua + riR = 1 
(0.1%), 5 (0.5%), and 10 (1%), respectively, and the unknown true scores (3*^^ are indicated by X. Note 
that, except for the 2nd and the 3rd test instances with ha + ur = 10 (pink), the signs of the lower and 
the upper bounds are same, meaning that the class labels of these test instances are immediately available 
without actually updating the classifier. 


the class label y can be available without actually obtaining ,3*g^ if the bounds are sufficiently tight such 
that the signs of the lower and the upper bounds are same. If the number of updated instances ua + tir is 
relatively smaller than the entire training set size rioid or nnew 7 we expect that the two solutions and 
,9*g^ would not be so different. In such cases, as we demonstrate empirically in (J5l the bounds in ([6]) are 
sufficiently tight in many cases. Figure [2] illustrates the tightness of the bounds in a toy dataset. 


2.4 Leave-one-out cross-validation (LOOCV) 


One of the traditional problem setups to which our proposed framework can be naturally applied is leave- 
one-out cross-validation (LOOCV). The LOOCV error is defined as 

LOOCV error ■= ^ 7^ sgn(a:7/3(’_/,))), 

hGln] 

where sgn(-) is the sign, and is the optimal solution after leaving out the instance, which is defined 

as 


/3(-0 


:= arg min 


1 


^old 1 


^(l/i,/(a:o/3)) + ^ll/3f • 


Here, our idea is to regard the solution obtained by the whole training set as and as /3*g^. By 

setting 7] := yhXh in ([5]), we can compute the bounds such that 


L{yhxl^l_y,)) < yhxl(3l_f^-^ < U{yhxll3l_y^)). 


(8) 





These bounds in ([5|) can be used to know whether the left out instance is correctly classified or not. If the 
lower bound is positive, the left-out instance will be correctly classified, while it will be mis-classified if the 
upper bound is negative. 

Using dS]), we can also obtain the bounds on the LOOCV error itself: 

LOOCV error I < o) , 

h^[n] 

LOOCV error I {L{yhxll3l_f^)) > o) , 

h^[n] 

where /(•) is the indicator function. In numerical experiments, we illustrate that this approach works quite 
well. 


3 Quick sensitivity analysis 


In this section we present our main results on our quick sensitivity analysis framework. The following 
theorem tells that we can compute a lower and an upper bounds of a general linear score r]^ 13*^^ by using 
the original solution 

Theorem 1. Let 

^ — 1 2^ - 

77 , D \ 


As := 


riA + riR 




KieA 


Then, for an arbitrary vector r/ € the linear score l3^ew satisfies 
rTl^lew > Hv^f^new) 


nnew + Uold T a* ^-i^A+nfi -p ^ n ii 

^ f^oid-^ - V As--||77|| 

^'^new ‘^'^new ^ 


UA - nH 


Ur, 


^*old + ^ 


P UA + Ur 


As 


(9) 


(lOa) 


‘^new nold 


2nr, 


la* ^-l^A+nR -p I 1 II I 

V f^oid-^ -7y - V As-1--||t7|| 

^'^new ^ 




(10b) 


The proof is presented in Appendix 

An advantage of the bounds in (flUl) is that the computational complexity does not depend on the total 
number of instances, but only on the number of modified instances. It is easy to confirm that the main 
computational cost of these bounds is in the computation of As in dH]), and its complexity is 0{{nA + nR)d). 
The tightness of the bounds, i.e., the difference between the upper and the lower bounds is written as 


U{v^f3LJ - = 




( 11 ) 


^new ^new 

In a typical sensitivity analysis where ua and ur are much smaller than Unew, the tightness in dill) would be 
small. Note also that the tightness depends inversely on the regularization parameter A. If A is very small 
and close to zero, the bounds become very loose. 
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3.1 Sensitivity analysis of coefficients 

As discussed in >i2.21 by substituting r] := ej, j € [d], into (|T(I1) . we obtain a lower and an upper bounds of 
the coefficient of the new classifier. 

Corollary 2. For j G [d], the coefficient of the new classifier satisfies 




■^new,_7 
^new H" ^old 


2n 


new 


■^'tnew ^ 


HA 




(12a) 


< C^(/ 3 Lw.,) 

_^new H“ ^old 

2?lnew 




UA 




(12b) 


Note that the third term does not depend on j G [d], i.e., the tightness of the bounds in (IT^ is common 
for all the coefficients (3nev/,j^ 3 ^ [^]- 

Given a lower and an upper bounds of the coefficients j G [d], we can obtain the bounds of 

any quantities depending on For example, it is straightforward to know how much the classifier’s 

coefficients can change by the incremental operation when the amount of the change is measured in terms 
of some norm of /3*^^ - (3*^^. 


Corollary 3. For any q > Q, let \\z\\q be the Lq norm of a vector z. Then the difference between the old 
and the new classifier’s coefficients in Lq-norm is bounded from above as 

ii/^Lw - KmW, < (E °"ax{/3:id,, - L(/3Lw.,), «a(/ 3:3^,,) - ' ■ (13) 

16 M 


Some readers might note that a lower and an upper bounds of a general linear score can be 

simply obtained by using the bounds of each coefficients 13*^^ j, j G [d]. Such naive bounds are given as 


l^new > L{V^f^Lw) ■= E ^l^(/3newj)+ E (Ha) 

j\Vj<0 j\vj>0 

< UirT^lg^) := E + E (Hb) 

i\vj<o j\vj>o 

The tightness of the bounds in (fTH) is written as 


t/(r,^/3:_) - Z(77^/3:ew) := E 

16[d] 

which is clearly much looser than m- Thus, if the quantity of the interest is written in a linear score form 
(3 *we should use the bounds in (fTOl) rather than (fT4l) . 
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3.2 Sensitivity analysis of class labels 

Next, we use Theorem [T] for sensitivity analysis of new class labels. As discussed in ^2.31 for an input vector 
X G we can obtain a lower and an upper bounds of a linear score by setting rj — x. From ([T]), 

we can know the new class label if the signs of the lower and the upper bounds are same. 

Corollary 4. Let x G be an arbitrary d-dimensional input vector. Then, the classification result 


y := sgn(/(a;;/3*g^)) = sgn{x^(3*^J 


satisfies 


where 


y = 


+1 if L{x^(31^,^) > Q, 

-1 < 0 , 

unknown otherwise, 


;= ""7 + "°“ xT/ 3:„ - A. - i||x|| 

^^new ^'^new ^ 


UA - Ur 


Ur. 


f^old + 


^nA + riR 


■■= "” + "°“ x^/3:„ - A-'=i^xT A. + i||x| 

^^new ^^new ^ 


UA - Ur 


f^lld + A 


UA + Ur 


^new ^new 

Corollary |4] is useful in transductive setups [1^ where we are only interested in the class labels of the 
prespecified set of test inputs. 


3.3 Quick leave-one-out cross-validation 

In LOOCV, we repeat leaving out a single instance from the training set, and check whether it is correctly 
classified or not by the new classifier which is trained without the left-out instance. Thus, each step of 
LOOCV computation can be considered as an incremental operation with ua = 0 and ur = 1. Denoting 
the left-out instance as {xh,yh), h G [ndd], the task is to inquire whether the left-out instance is correctly 
classihed or not, which is known by checking the sign of yhf{xh', (3new) = Pnew 

Corollary 5. Consider a single step of LOOCV computation where an instance {xh,yh),h G [uoid]; is left 
out. Then, 


L{yhf{xh-, > 0 {xh,yh) is correctly classified 

Uiyhf{xh;Pnew)) <0 {Xh,yh) is mis-classified 


where 


LiVhfiXh-, Pnew)) 


2'^old 2 


Vh^hl^^old + 


^^old — ^ ^ 


Xh\\ 


-1 


^old t 


P*old + 


A- 


^old 1 


v£,(/3:id) 


(15a) 
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U{yhf{xh;f3', 

_2?T.old 1 

2'Hold 2 


‘ )) 
lew// 

yhxlf3iia + 


X 


-1 




V£,(/ 3 *id) + -||a;,,|| 


-1 


^old 1 


f^lld + 


A 


-1 


^old 1 




(15b) 


4 Tightening linear score bounds via a suboptimal solution 


In the previous section we introduced a framework that can quickly compute a lower and an upper bounds of 
a linear score of the new classifier. Unfortunately, it is not always the case that these bounds are sufficiently 
tight for making a desired inference on the new classifier. For example, if the lower and the upper bounds 
of do not have the same sign for a test input x, we cannot tell which class it would be classified to. 

In this section we discuss how to deal with such a situation. 

The simplest way to handle such a situation is just to use conventional incremental learning algorithms. 
If we completely solve the optimization problem Q by an incremental learning algorithm, we can obtain 
/3*g^ itself. However, if our goal is only to make a particular inference about the new classifier, we do not 
have to solve the optimization problem ([5]) completely until convergence. In this section we propose a similar 
framework for computing a lower and an upper bounds of a linear score by using a suboptimal solution before 
convergence which would be obtained during the optimization of problem ([5]) . 

We denote such a suboptimal solution as 0nev/- In order to compute the bounds, we use the gradient 
information of the problem Q, which we denote 


9i0new) ■ — 


1 


V.^i(^new) + A^n 


(16) 


-now . „ 

* c ^new 

The complexity of computing the gradient vector from scratch is O(nnewd)- However, if we are using a 
gradient-based optimization algorithm such as conjugate gradient or quasi-Newton methods, we should have 
already computed the gradient vector in each iteration of the optimization algorithm. The following theorem 
provides a lower and an upper bound of a linear score by using the current gradient information. If we already 
have computed Sf(/3new), these bounds can be obtained very cheaply. 


Theorem 6. For an arbitrary vector r] G , the linear score 9^(3new satisfies 

V^f^new > L{V^(3^ew) ■= “ ^?7^g(/3new) “ ^ II ^ II llfl'(/3new) ||, (17a) 

V^f3*new < U{r]^l3*new) ■= 9^^new - 90new) + ^ II ^ II llfl-l/^new) || ■ (17b) 

The proof is presented in Appendix 

A nice property of the bounds in (ITTl) is that the tightness 


= A-i||77||||sf(/3„ew)|| 


is linear in the norm of the gradient ||g(/3new)||- It means that, as the optimization algorithm for ([2]) 
proceeds, the gap between the lower and the upper bounds in (flTll decreases, and it converges to zero as 
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Table 1: Benchmark datasets used in the experiments. 



dataset name 

IT-train 

d 

IT-test 

D1 

sonar 

208 

60 

not used 

D2 

splice 

1000 

60 

not used 

D3 

w5a 

9888 

300 

not used 

D4 

a7a 

16100 

123 

not used 

D5 

a9a 

32561 

123 

16281 

D6 

ijcnn 

49990 

22 

91701 

D7 

cod-rna 

59535 

8 

271617 

D8 

kdd2010 

> 8 million 

>20 million 

> 0.5 million 


the solution converges to the optimal one. Theorem [S] can be used as a stopping criterion for incremental 
learning optimization problem Q. For example, in a sensitivity analysis of class labels, one can proceed the 
optimization process until the signs of the lower and the upper bounds in di) become same. 

5 Numerical experiments 

In this section we describe numerical experiments. In ilS.Il we illustrate the tightness and the computational 
efficiency of our bounds in two sensitivity analysis tasks described in 112.21 and 112.31 In 115.21 we apply our 
framework to LOOCV computation as described in 112.41 and compare its performance with conventional 
LOOCV computation methods. 

Table [T] summarizes the datasets used in the experiments. They are all taken from libsvm dataset 
repository [4]. For the experiments in 115.11 we used larger datasets D5-D8. For LOOCV experiments in 115.21 
we used smaller datasets D1-D4. As examples of the loss function £, we used LR loss (jS]) and SVM loss (jll). In 
i l5.Il we only show the results on logistic regression. In 115.21 we compare our results on SVMs with conventional 
LOOCV methods particularly designed for SVMs. For logistic regression, we only compare our framework 
with conventional incremental learning algorithm because there is no particular LOOCV computation method 
for logistic regression. As an incremental learning algorithm, which is used as competitor and as a part of 
our algorithm for LOOCV computation, we used the approach in (TS]. All the computations were conducted 
by using a single core of an HP workstation Z820 (Xeon(R) CPU E5-2693 (3.50GHz), 64GB MEM). 

5.1 Results on two sensitivity analysis tasks 

Here we show the results on two sensitivity analysis tasks described in 112.2l and 112.31 We empirically evaluate 
the tightness of the bounds and the computational costs for larger datasets D5-D8. Eirst, we see how the 
results change as the number of added and/or removed instances changes among UA+nn G {0.01%, 0.02%, 
0.05%, 0.1%, 0.2%, 0.5%, 1%} of the entire training set size rioid- Next, we see the results when the number 
of the entire training set size changes among Uom € {10%, 20%, ..., 90%, 99%} of ntraim while the number 
of added and/or removed instances is fixed to ua + = O.OOlritrain- 
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Algorithm 1 Proposed LOOCV method (opl) 

Input: 

1 : ^*11 ^solve Q , /i ^ 1, err -( r - 0 

2: while h < rioid do 

3 : if U{yhfixh;/3^_f^^) < 0 then 

4: err <— err + 1 

5 : else if L{yhf{xh\l3*_f^-^) < 0 then 

■<—solve ([5]) by incremental learning algorithm 
7: if yhxj^l3*_^^ < 0 then 

8 : err •(— err + 1 

9: end if 

10: end if 

11: h A — h “b 1 

12: end while 

Output: LOOCV error: errjrioid 


In the first sensitivity analysis task about coefficients (see 112.21 and we simply computed the 

difference between the upper and the lower bounds — L{l3*g^ j) for evaluating the tightness of the 

bounds. For the second sensitivity analysis task about class labels (see H2.3I and 113.21) . we examined the 
percentage of the test instances for which the signs of the lower and the upper bounds are same. Remember 
that the class label can be immediately available when the lower and the upper bounds have same sign. 

Tables m - [7] show the results for the former task. (Figure |3] depicts the results on D8 as an example). 
These results indicate that the bounds are fairly tight if the ua + nji is relatively smaller than rioid- The 
computational costs of our proposed framework (blue thick curves) are negligible compared with the costs 
of actual incremental learning (red thick curves). 

Tables 151 -IT^ show the results for the latter task (Figure S] depicts the results on D8 as an example). The 
results here indicate that, in most cases, the bounds are sufficiently tight for making the signs of the lower 
and the upper bounds same. It means that, in most cases, the new class labels after incremental operation 
are available without actually updating the classifier itself. 

The results presented here were obtained with the regularization parameter A = 0.01, 0.1, 1. We observed 
that, for larger A, the bounds became tighter. These all experiment results are mean and variance of 
performing 30 times. 

5.2 Leave-one-out cross-validation 

We applied the proposed framework to LOOCV task, and compare its computational efficiency with existing 
approaches. We consider two options. In the first option (opl), we only used the method described in ^13.31 
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In the second option (op 2 ), we also used the method described in SJH Algorithm [T] is the pseudo-code for 
computing LOOCV errors by using the proposed framework with opl. Briefly speaking, for each of the 
left-out instance {xh,yh), we first compute the lower and the upper bounds of Then, if 

the lower/upper bound is greater/smaller than zero, we confirm that the instance is correctly/incorrectly 
classified. When the signs of the two bounds are different, the class label is unknown. In such a case, we use 
conventional incremental learning algorithm in [18) . In opl, we ran the incremental learning algorithm until 
its convergence and obtain itself. In op 2 , we stopped the optimization process at which the signs of 

the bounds in CD) became same. 

For SVMs, several LOOCV methods have been studied in the literature [liiiQ]. For the experiments 
with SVM loss, we thus compare our approach with the methods in m and m- The former approach 
merely exploits the fact that adding and/or removing non-support vectors does not change the classifier. 
The method called ^-a estimator [10] also provides a lower bound of yhf{xh', /3new) without actually obtaining 
For the experiments with logistic regression loss, we compare our approaches only with incremental 
learning approach [18] because there are no other competing methods. 

We used the above LOOCV computation methods in model selection tasks for linear and nonlinear 
classification problems. In linear case, the task is to find the regularization parameter A S 2“^®,..., 2°} 

that minimizes the LOOCV error. In nonlinear case, we used Gaussian RBF in the form of (j)k{x) = 
exp(— 7 ||a; — a:fc|p), where k G [100] were randomly selected from [rioidj- Here, the task is to select the 
optimal combination of (A, 7 ) G {2“^®, 2“^^,..., 2“®} x {2“^, 2“^,..., 2^} that minimizes the LOOCV error. 

For further speed-up, we also conducted experiments with two simple tricks. In the hrst trick we used the 
lower and the upper bounds of the LOOCV error itself. If the lower bound of one model is greater than the 
upper bound of another model, the former model would never be selected as the best model, meaning that 
the LOOCV error computation process can be stopped. The second simple trick is to conduct incremental 
learning operations in the increasing order of yhf{xh;f3*i^). It is based on a simple observation that the 
class label of an instance whose yhf[xh\ l3o\d) value is small tends to be mis-classified. Note that these two 
tricks can be used not only for our proposed framework, but also for other competing approaches. 

TablesfTHandfTSlshow the results without and with the tricks, respectively. We see that the computational 
cost of our proposed framework (especially op2) are much smaller than competing methods. It indicates 
that our bounds in ()I5I) is tighter in many cases than the existing bounds for LOOCV error computation. 

6 Conclusions and future works 

In this paper we introduced a novel framework for sensitivity analysis of large scale classification problems. 
The proposed framework provides a lower and an upper bounds of a general linear score on the updated 

Note that, when one already knows that some of the left-instances are correctly classified or not, the LOOCV error itself 
can be bounded. 


15 



classifier without actually re-optimized it. The advantage of the proposed framework is that the computa¬ 
tional cost only depends on the sizes of the modified instances, which is particularly advantageous in typical 
sensitivity analysis task where only relatively small number of instances are updated. We discussed three 
tasks to which the proposed framework can be applied. As a future work, we plan to apply the proposed 
framework to stream learning. 
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A Proofs 

In this section we prove Theorems [1] and [6] First we present the following proposition. 

Proposition 7. Consider the following general problem: 

min (^(z) s.t. z S Z, (18) 

Z 

where ^ : Z —>■ R is a differentiable convex function and Z is a convex set. Then a solution z* is the optimal 
solution of if and only if 

V(/>(z*)^(z*-z) < 0 VzGZ, 
where V(()(z*) is the gradient vector of (p at z = z*. 

See, for example, Proposition 2.1.2 in [1] or Section 4.2.3 in [2] for the proof of Proposition [T] 

Proof of Theorem [II From Proposition 0 and the optimality of /3*ew problem ([I]) 

E + A^Lw) (/3Lw - ^o*ld) < 0. (19) 

*ei>„ew A 

From the convexity of ii, 

> Uf^neJ + V^,(^Lw)^(/3old - (20) 

> Wm ) + )^(/3:ew - Km)- (21) 

Using dlol) and (ED, 

- Km) > ^Wm )^(/3:ew - Km)- (22) 

By summing up ([22|l for all i G Pnew, 

E ^^^iK.K^{K.^-KM)> E '^^*(/3:id)^(/3Lw-/3:id)- ( 23 ) 
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Substituting (1^ into (fTOl) . 


1 


E '^^*(/3o*id)^(/3:ew - Kid) + - Kid) < 0. 


-new . rjy 

i^new 

By completing the square of (I24L we have 


Ke^-llKld-^ 


E ^Wid) 


i^'Dn 


<n 


K 


old 


E ^Wid) 


i^'Dn 


Furthermore, noting that is the optimal solution of (P), 

Kid + — E ^KKid) = ^- 


Using (1261) and (|9]), 
A-i 


A 


Tin 


E ^KKid) = ^[ E ^Wid) + Y."^Wid)-Y."^UKid) 




-( 

^new y 

^old 


Tin 


.iGVald 

- XuoidKid + Ka + nji)As 


, {nA + Ufi)A 1 ^ 

Kid + --- 


ien 


Substituting (071) into (051) . 

K - 

/^new 

Let 


J^old + ^new {UA + nR)X ^ ^ 

-Paid --^<5 


277-1] 


277.n 


< 1 


nA -nR^^ , {uA + nR)X ^ 

Po\d ' 


TT-new 


_ 77-old “t” TT-new (uA + nR)X ^ _ 1 

7^ Mold - ^5 ,7^ _ 


277-n 


277-n 


nA -nR ^ (riA + nR)X ^ 

-Pold +- 


Then, (0^ is compactly written as 

Knew e where U := {/3 | ||/3 - m|p < r^}. 


(24) 


(25) 


(26) 


(27) 


(28) 


(29) 


(30) 


Eq. (I30p indicates that the new optimal solution /3*g^ is within a ball with center m and radius r. Thus, 
we have a lower and an upper bounds of a linear score r/^(3*^^ as follows: 


Kv^Kew) ■= mm -n^(3, 
Uiv^Kew) ■= max rjT/3. 


(31) 

(32) 


In fact, the solutions of (OTl) and (15^ can be analytically obtained, and thus the lower bound L(rj^/3*g^) 
and the upper bound 17(r7^/3*g^) can be explicitly obtained by using Lagrange multiplier method. Using a 
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Lagrange multiplier a > 0, the problem m is rewritten as 

min rf^(3 s.t. \\f3 — m\\^ < 

= minmax (ri^3 + a{\\B — mlP — r^)) 

/3 ct>0 ^ ' 

= max ( — ar^ + min (a||/3 — mlP + ri^0)) 

= max H(a) := ( — a0 — —h r0m ), 

where a is strictly positive because the constraint ||/3 — m|p < is strictly active at the optimal solution. 
By letting dH{a)/da = 0, the optimal a is written as 


Substituting a* into H{a), 


Therefore, 


a* := —— = argmax H(a). 
2r ^ a>o ^ ^ 


r] m — llrjljr = max H{a). 

a>0 


= min 70(3 = r0m - 


The upper bound of ri0(3 in (1321) can be similarly obtained as 


Uiji f3*^J = max -q f3 = q m+ \\q\\r. 


By substituting m and r in (1^ into (1551) and (1551) . we have (llOal) and (llObI) . 

Theorem!^ can be shown in a similar way as above. 

Proof of Theorem\^ From Proposition [7] and the optimality of (3*^^, 


1 _ \ ^ 

^^i(/3new) + -^/^new ) (^new ~ /^new) < 0. 
n-new / 


(33) 


(34) 


(35) 


From the convexity of ii 


^iif^Lw) > £i0new) + (/3new)^ (/3new “ /^new)- 

ii{0new) > ^i{f3new) + (/^new)^ (/^new — ^new)’ 


(36) 

(37) 


Using dMl) and (I57l) and the definition of g{0new), the inequality (1351) can be rewritten as the following 
condition on the new optimal solution f3^g^'. 


/3Lw e n, 


(38) 
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where 


Cl ■= 



/3 









Then, the lower and the upper bounds in are obtained by solving the following minimization and 
maximization problems 


■= min /3, (39) 

/3en 

U{v'^l3new) :=max (3. (40) 


/3Gn 

Using the standard Lagrange multiplier method, the solutions of (15^ and (HU)l are analytically obtained as 
(I17al) and (I17bl) . respectively. ■ 
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Figure 3: Results on sensitivity analysis of coefficients for D 8 . The tightness of the bounds and the compu¬ 
tation time in seconds are plotted for A = 0.01 (top), 0.1 (middle), and 1 (bottom). 
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Figure 4: Results on sensitivity analysis of class labels for D8. The fraction of the test instances whose lower 
and upper bounds of the decision score have same signs, and the computation time in seconds are plotted 
for A = 0.01 (top), 0.1 (middle), and 1 (bottom). 
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Table 2: Results on sensitivity analysis of coefficients for various values of (ua + n^Cj/noid- The tightness of 
the bounds and the computation time in seconds are listed (A = 0.01). 



time [sec] 


D8 tightness 


time [sec] 


Incremental 


N.A. 


3.03e-02 

(±3.24e-03) 


N.A. 


3.17e-02 

(±7.71e-03) 


N.A. 


8.41e-02 

(±1.47e-02) 


N.A. 


4.11e+01 

(±4.95e+00) 


proposed 


7.55e-04 

(±3.74e-04) 


4.13e-06 

(±4.27e-07) 


1.86e-04 

(±3.86e-05) 


4.63e-06 

(±8.36e-07) 


2.14e-04 

(±7.57e-05) 


6.30e-06 

(±9.36e-07) 


6.56e-05 

(±1.65e-06) 


1.61e-01 

(±1.03e-02) 


(ka + "Rl/riold 


0.1% 


Incremental 


N.A. 


4.29e-02 

(±4.89e-03) 


N.A. 


6.49e-02 

(±6.65e-03) 


N.A. 


1.02e-01 

(±1.45e-02) 


N.A. 


5.96e+01 

(±1.50e+00) 


proposed 


2.27e-03 

(±1.26e-03) 


2.21e-05 

(±6.29e-07) 


6.54e-04 

(±1.10e-04) 


3.09e-05 

(±1.45e-06) 


6.60e-04 

(±2.80e-04) 


3.08e-05 

(±4.94e-06) 


2.07e-04 

(±3.94e-06) 


1.72e-01 

(±8.33e-03) 


Incremental 


N.A. 


4.45e-02 

(±1.85e-03) 


N.A. 


6.60e-02 

(±7.14e-03) 


N.A. 


1.13e-01 

(±1.42e-02) 


N.A. 


7.11e+01 

(±9.80e+00) 


proposed 


6.49e-03 

(±1.83e-03) 


1.94e-04 

(±8.23e-07) 


2.09e-03 

(±3.62e-04) 


2.80e-04 

(±6.50e-06) 


2.34e-03 

(±9.89e-04) 


2.31e-04 

(±1.43e-05) 


6.50e-04 

(±1.57e-05) 


3.53e-01 

(±1.15e-02) 



Table 3: Results on sensitivity analysis of coefficients for various values of (u-a + niCj/noXd- The tightness of 
the bounds and the computation time in seconds are listed (A = 0.1). 
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Table 4: Results on sensitivity analysis of coefficients for various values of (ua + n^Cj/noid- The tightness of 
the bounds and the computation time in seconds are listed (A = 1). 



time [sec] 


D8 tightness 


time [sec] 


10 % 


Incremental proposed 


6.88e-03 

(±1.77e-04) 


N.A. 


5.27e-03 

(±3.00e-04) 


N.A. 


6.96e-03 

(±5.53e-04) 


N.A. 


3.41e+01 

(±3.16e+00) 


1.92e-01 

(±5.96e-02) 


2.14e-05 

(±6.05e-07) 


5.07e-02 

(±1.20e-02) 


2.89e-05 

(±5.73e-07) 


6.55e-02 

(±2.83e-02) 


2.50e-05 

(±5.46e-06) 


9.34e-03 

(±1.68e-04) 


1.63e-01 

(±8.33e-03) 


^old / n.trf 


50% 


Incremental 


N.A. 


4.12e-02 

(±1.01e-02) 


N.A. 


5.24e-02 

(±1.32e-02) 


N.A. 


1.12e-01 

(±1.64e-02) 


N.A. 


9.83e+01 

(±9.30e+00) 


proposed 


3.41e-02 

(±1.29e-02) 


2.41e-05 

(±1.06e-06) 


8.81e-03 

(±1.84e-03) 


3.36e-05 

(±1.28e-06) 


l.lOe-02 

(±4.27e-03) 


3.06e-05 

(±1.76e-06) 


1.57e-03 

(±1.95e-05) 


1.77e-01 

(±l.lle-02) 


99% 


Incremental proposed 


7.58e-02 

(±1.14e-02) 


N.A. 


l.Ole-01 

(±1.63e-02) 


N.A. 


1.80e-01 

(±2.87e-02) 


N.A. 


l.lle+02 

(±9.91e+00) 


2.02e-02 

(±6.57e-03) 


2.64e-05 

(±1.30e-06) 


5.13e-03 

(±1.25e-03) 


3.49e-05 

(±1.36e-06) 


6.81e-03 

(±3.09e-03) 


3.05e-05 

(±2.22e-06) 


9.54e-04 

(±1.73e-05) 


1.60e-01 

(±5.66e-03) 



Table 5: Results on sensitivity analysis of coefficients for various values of noid/u-train- The tightness of the 
bounds and the computation time in seconds are listed (A = 0.01). 
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Table 6: Results on sensitivity analysis of coefficients for various values of noid/?T^train- The tightness of the 
bounds and the computation time in seconds are listed (A = 0.1). 



time [sec] 


D8 tightness 


time [sec] 


10 % 


Incremental proposed 


2.42e-03 

(±2.16e-04) 


N.A. 


2.66e-03 

(±2.74e-04) 


N.A. 


3.27e-03 

(±6.01e-04) 


N.A. 


6.89e-t-00 

(±2.93e-02) 


2.48e-03 

(±7.00e-04) 


2.05e-05 

(±6.71e-07) 


7.67e-04 

(±1.42e-04) 


2.92e-05 

(±2.48e-06) 


7.40e-04 

(±2.81e-04) 


2.25e-05 

(±7.19e-07) 


2.85e-04 

(±9.97e-06) 


1.64e-01 

(±8.92e-03) 


^old / n.tr; 


50% 


Incremental 


N.A. 


1.50e-02 

(±2.38e-03) 


N.A. 


2.18e-02 

(±4.72e-03) 


N.A. 


2.66e-02 

(±7.92e-03) 


N.A. 


1.89e+01 

(±3.32e+00) 


proposed 


4.22e-04 

(±1.17e-04) 


2.51e-05 

(±1.51e-06) 


1.25e-04 

(±2.32e-05) 


3.52e-05 

(±2.26e-06) 


1.27e-04 

(±4.66e-05) 


2.90e-05 

(±1.60e-06) 


4.77e-05 

(±1.18e-06) 


1.59e-01 

(±6.00e-03) 


Incremental 


N.A. 


3.01e-02 

(±5.83e-03) 


N.A. 


3.92e-02 

(±6.66e-03) 


N.A. 


3.71e-02 

(±8.81e-03) 


N.A. 


2.96e+01 

(±3.07e+00) 


proposed 


2.61e-04 

(±7.27e-05) 


2.79e-05 

(±2.31e-06) 


8.32e-05 

(±1.77e-05) 


3.62e-05 

(±3.42e-06) 


7.09e-05 

(±2.39e-05) 


2.97e-05 

(±2.27e-06) 


2.91e-05 

(±1.06e-06) 


1.80e-01 

(±2.07e-02) 



Table 7: Results on sensitivity analysis of coefficients for various values of noid/u-train. The tightness of the 
bounds and the computation time in seconds are listed (A = 1). 
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(ka + nR)/nold 

0.01% 

0.1% 

1% 

Incremental 

proposed 

Incremental 

proposed 

Incremental 

proposed 

D5 

fraction of “same sign” 

N.A. 

9.96345e-01 

(±1.68e-03) 

N.A. 

9.88742e-01 

(±4.33e-03) 

N.A. 

9.65412e-01 

(±1.01e-02) 

time [sec] 

6.80e-02 

(±1.09e-02) 

4.15e-04 

(±1.90e-05) 

7.89e-02 

(±1.78e-02) 

4.36e-04 

{±1.04e-05) 

8.82e-02 

(±1.98e-02) 

6.38e-04 

(±3.84e-05) 

D6 

fraction of “same sign” 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

time [sec] 

1.13e-01 

(±2.13e-02) 

2.80e-03 

(±1.50e-04) 

l.lOe-01 

(±1.38e-02) 

2.86e-03 

{±1.63e-04) 

1.37e-01 

(±1.37e-02) 

3.14e-03 

(±1.20e-04) 

D7 

fraction of “same sign” 

N.A. 

9.99728e-01 

(±5.51e-05) 

N.A. 

9.99354e-01 

(±1.71e-04) 

N.A. 

9.98612e-01 

(±4.63e-04) 

time [sec] 

1.40e-01 

(±3.26e-02) 

5.30e-03 

(±4.20e-04) 

1.55e-01 

(±3.42e-02) 

5.15e-03 

(±1.31e-04) 

1.96e-01 

(±2.45e-02) 

5.76e-03 

(±4.57e-04) 

D8 

fraction of “same sign” 

N.A. 

9.99869e-01 

(±7.49e-06) 

N.A. 

9.99583e-01 

(±1.76e-05) 

N.A. 

9.98672e-01 

(±9.61e-05) 

time [sec] 

1.25e+02 

(±7.47e+00) 

1.40e-01 

(±9.27e-03) 

1.48e+02 

(±1.80e+01) 

1.67e-01 

(±8.01e-03) 

1.67e+02 

(±1.25e+01) 

3.49e-01 

(±2.14e-02) 


Table 8: Results on sensitivity analysis on class labels for various values of (ua + UR)/noid- The fraction of 
the test instances whose lower and upper bounds of the decision score have same signs, and the computation 
time in seconds are listed (A = 0.01). 




(ka + nR)/nold 



0.01% 

0.1% 

1% 



Incremental 

proposed 

Incremental 

proposed 

Incremental 

proposed 

D5 

fraction of “same sign” 

N.A. 

9.99449e-01 

(±1.67e-04) 

N.A. 

9.97822e-01 

(±8.77e-04) 

N.A. 

9.95043e-01 

(±1.16e-03) 


time [sec] 

4.56e-02 

(±1.23e-02) 

4.20e-04 

(±3.36e-05) 

5.54e-02 

C±1.59e-02) 

4.38e-04 

C±1.90e-05) 

6.24e-02 

(±1.50e-02) 

6.47e-04 

(±3.49e-05) 

D6 

fraction of “same sign” 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 


time [sec] 

7.37e-02 

(±1.87e-02) 

2.70e-03 

(±3.38e-04) 

7.02e-02 

(±1.52e-02) 

2.55e-03 

(±1.60e-04) 

7.54e-02 

(±1.47e-02) 

2.90e-03 

(±2.06e-04) 

D7 

fraction of “same sign” 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±1.25e-06) 


time [sec] 

6.46e-02 

(±1.84e-02) 

4.98e-03 

(±3.80e-04) 

6.93e-02 

(±1.61e-02) 

4.99e-03 

(±2.89e-04) 

7.71e-02 

(±1.60e-02) 

5.14e-03 

(±2.40e-04) 

D8 

fraction of “same sign” 

N.A. 

9.99964e-01 

(±2.19e-06) 

N.A. 

9.99952e-01 

(±1.15e-06) 

N.A. 

9.99903e-01 

(±2.55e-06) 


time [sec] 

6.15e+01 

(±6.23e+00) 

1.38e-01 

(±8.46e-03) 

6.41e+01 

(±1.06e+00) 

1.56e-01 

(±4.81e-03) 

6.95e+01 

(±7.68e+00) 

3.41e-01 

(±1.07e-02) 


Table 9: Results on sensitivity analysis on class labels for various values of (ua + UR)/noid- The fraction of 
the test instances whose lower and upper bounds of the decision score have same signs, and the computation 
time in seconds are listed (A = 0.1). 
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(ka + nR)/nold 

0.01% 

0.1% 

1% 

Incremental 

proposed 

Incremental 

proposed 

Incremental 

proposed 

D5 

fraction of “same sign” 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

time [sec] 

1.51e-02 

(±8.36e-04) 

3.86e-04 

(±5.01e-06) 

2.01e-02 

{±2.77e-03) 

4.18e-04 

(±3.56e-06) 

2.25e-02 

(±7.19e-04) 

5.54e-04 

(±6.64e-06) 

D6 

fraction of “same sign” 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

time [sec] 

2.14e-02 

(±4.13e-05) 

2.03e-03 

(±3.99e-05) 

2.35e-02 

{±2.31e-03) 

2.13e-03 

(±8.39e-05) 

2.52e-02 

(±4.43e-04) 

2.41e-03 

(±3.49e-05) 

D7 

fraction of “same sign” 

N.A. 

9.99872e-01 

(±4.57e-06) 

N.A. 

9.99869e-01 

(±1.13e-05) 

N.A. 

9.99848e-01 

(±7.02e-06) 

time [sec] 

6.13e-02 

(±1.08e-02) 

5.72e-03 

(±3.56e-04) 

6.63e-02 

(±1.21e-02) 

5.57e-03 

(±2.49e-04) 

8.07e-02 

(±7.28e-03) 

5.86e-03 

(±2.83e-04) 

D8 

fraction of “same sign” 

N.A. 

9.99925e-01 

(±6.66e-07) 

N.A. 

9.99916e-01 

(±8.98e-07) 

N.A. 

9.99906e-01 

(±3.52e-07) 

time [sec] 

2.60e+01 

(±1.30e+00) 

1.40e-01 

(±1.01e-02) 

2.90e+01 

(±4.89e+00) 

1.53e-01 

(±4.08e-03) 

4.02e+01 

(±1.18e+00) 

3.44e-01 

(±1.18e-02) 


Table 10: Results on sensitivity analysis on class labels for various values of (u-a + UR)/noid- The fraction of 
the test instances whose lower and upper bounds of the decision score have same signs, and the computation 
time in seconds are listed (A = 1). 




^T-old/'Strain 



10% 

50% 

99% 



Incremental 

proposed 

Incremental 

proposed 

Incremental 

proposed 

D5 

fraction of “same sign” 

N.A. 

9.26417e-01 

(±4.66e-03) 

N.A. 

9.83689e-01 

(±3.86e-03) 

N.A. 

9.89409e-01 

(±4.12e-03) 


time [sec] 

8.02e-03 

(±2.32e-05) 

4.12e-04 

(±1.18e-05) 

3.62e-02 

(±1.79e-03) 

4.33e-04 

(±2.26e-05) 

8.14e-02 

{±1.27e-02) 

4.29e-04 

(±1.72e-05) 

D6 

fraction of “same sign” 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 


time [sec] 

1.07e-02 

(±6.79e-05) 

2.70e-03 

(±1.55e-04) 

6.07e-02 

(±1.15e-02) 

2.64e-03 

(±7.42e-05) 

1.04e-01 

(±1.84e-02) 

2.66e-03 

(±1.68e-04) 

D7 

fraction of “same sign” 

N.A. 

9.91657e-01 

(±5.34e-03) 

N.A. 

9.99136e-01 

(±2.99e-04) 

N.A. 

9.99477e-01 

(±1.69e-04) 


time [sec] 

1.95e-02 

(±1.13e-03) 

6.54e-03 

(±6.54e-04) 

1.36e-01 

(±3.16e-02) 

6.19e-03 

(±5.76e-04) 

2.02e-01 

{±2.81e-02) 

6.36e-03 

(±5.60e-04) 

D8 

fraction of “same sign” 

N.A. 

9.94789e-01 

(±5.11e-04) 

N.A. 

9.99324e-01 

(±3.60e-05) 

N.A. 

9.99582e-01 

{±1.54e-05) 


time [sec] 

2.54e+01 

(±8.29e-01) 

1.54e-01 

(±7.58e-03) 

9.25e+01 

(±9.54e+00) 

1.57e-01 

(±6.73e-03) 

1.39e+02 

(±1.75e+01) 

1.54e-01 

(±5.01e-03) 


Table 11: Results on sensitivity analysis on class labels for various values of noid/?T-train- The fraction of the 
test instances whose lower and upper bounds of the decision score have same signs, and the computation 
time in seconds are listed (A = 0.01). 
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“^old / 'Strain 

10% 

50% 

99% 

Incremental 

proposed 

Incremental 

proposed 

Incremental 

proposed 

D5 

fraction of “same sign” 

N.A. 

9.82679e-01 

(±5.55e-16) 

N.A. 

9.96878e-01 

(±8.31e-04) 

N.A. 

9.97818e-01 

(±6.69e-04) 

time [sec] 

6.01e-03 

(±5.17e-05) 

4.63e-04 

(±5.10e-05) 

3.02e-02 

(±7.00e-03) 

4.42e-04 

(±1.61e-05) 

5.33e-02 

(±1.61e-02) 

4.39e-04 

(±2.81e-05) 

D6 

fraction of “same sign” 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

time [sec] 

9.40e-03 

(±7.60e-05) 

2.62e-03 

(±1.55e-04) 

4.33e-02 

(±9.01e-03) 

2.60e-03 

(±1.33e-04) 

7.80e-02 

(±1.10e-02) 

2.64e-03 

(±1.30e-04) 

D7 

fraction of “same sign” 

N.A. 

9.99998e-01 

(±3.26e-06) 

N.A. 

9.99998e-01 

(±1.84e-06) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

time [sec] 

1.63e-02 

(±5.24e-04) 

4.81e-03 

(±1.92e-04) 

4.49e-02 

(±6.80e-03) 

4.84e-03 

(±2.65e-04) 

7.20e-02 

(±1.51e-02) 

4.97e-03 

(±2.62e-04) 

D8 

fraction of “same sign” 

N.A. 

9.99756e-01 

(±1.59e-05) 

N.A. 

9.99940e-01 

(±3.20e-06) 

N.A. 

9.99951e-01 

(±8.67e-07) 

time [sec] 

1.38e+01 

(±2.57e-01) 

1.56e-01 

(±l.lle-02) 

4.20e+01 

(±4.04e-01) 

1.55e-01 

(±5.68e-03) 

6.59e+01 

(±2.39e+00) 

1.54e-01 

(±6.09e-03) 


Table 12: Results on sensitivity analysis on class labels for various values of noid/?T^train- The fraction of the 
test instances whose lower and upper bounds of the decision score have same signs, and the computation 
time in seconds are listed (A = 0.1). 




^T-old/'Strain 



10% 

50% 

99% 



Incremental 

proposed 

Incremental 

proposed 

Incremental 

proposed 

D5 

fraction of “same sign” 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 


time [sec] 

3.09e-03 

(±1.50e-05) 

3.83e-04 

(±6.77e-06) 

1.35e-02 

(±2.12e-03) 

4.01e-04 

(±7.79e-06) 

1.73e-02 

(±2.21e-03) 

4.13e-04 

(±4.42e-06) 

D6 

fraction of “same sign” 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 

N.A. 

l.OOOOOe+00 

(±0.00e+00) 


time [sec] 

7.36e-03 

(±3.23e-05) 

1.86e-03 

(±1.46e-05) 

1.63e-02 

(±6.71e-04) 

2.17e-03 

(±8.19e-05) 

2.78e-02 

(±9.21e-04) 

2.22e-03 

(±4.33e-05) 

D7 

fraction of “same sign” 

N.A. 

9.99754e-01 

(±3.28e-05) 

N.A. 

9.99885e-01 

(±1.50e-05) 

N.A. 

9.99865e-01 

(±6.89e-06) 


time [sec] 

1.66e-02 

(±9.04e-04) 

6.38e-03 

(±3.57e-04) 

5.31e-02 

(±1.00e-02) 

5.93e-03 

(±7.21e-04) 

7.31e-02 

(±1.28e-02) 

5.86e-03 

{±7.41e-04) 

D8 

fraction of “same sign” 

N.A. 

9.99735e-01 

(±2.19e-05) 

N.A. 

9.99915e-01 

(±1.49e-06) 

N.A. 

9.99916e-01 

(±8.67e-07) 


time [sec] 

6.87e+00 

(±8.04e-01) 

1.48e-01 

(±1.05e-02) 

2.36e+01 

(±3.73e+00) 

1.56e-01 

(±6.65e-03) 

2.83e+01 

(±2.54e+00) 

1.52e-01 

(±3.29e-03) 


Table 13: Results on sensitivity analysis on class labels for various values of noid/?T-train- The fraction of the 
test instances whose lower and upper bounds of the decision score have same signs, and the computation 
time in seconds are listed (A = 1). 
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SVM 

Logistic regression 

existing 

proposed 

existing 

proposed 

incremental 

[m 

uni 

opl 

op2 

incremental 

opl 

op2 

D1 

linear 

13.76 

10.46 

10.37 

7.74 

5.17 

13.95 

8.32 

2.78 

nonlinear 

33.19 

24.53 

15.63 

17.33 

8.48 

29.31 

17.23 

6.55 

D2 

linear 

52.87 

51.04 

44.28 

29.01 

15.93 

58.22 

28.63 

13.25 

nonlinear 

337.87 

312.44 

246.10 

201.58 

124.79 

268.71 

165.42 

120.57 

D3 

linear 

1167.65 

458.12 

229.57 

203.60 

123.69 

4075.96 

726.68 

346.30 

nonlinear 

96317.45 

77562.37 

46427.27 

58480.90 

46301.53 

91503.32 

34972.51 

28856.92 

D4 

linear 

18824.74 

14303.27 

12177.41 

8506.30 

2088.63 

25197.11 

10563.92 

1219.36 

nonlinear 

> 3 days 

> 3 days 

183208.76 

202972.55 

106169.25 

> 3 days 

125300.26 

47474.64 


Table 14: 


Computation time [sec] 


of model selection based on LOOCV (without tricks). 



SVM 

Logistic regression 

existing 

proposed 

existing 

proposed 

incremental 

El 

El 

opl 

op2 

incremental 

opl 

op2 

D1 

linear 

8.88 

8.69 

8.53 

5.79 

4.45 

5.41 

3.96 

1.66 

nonlinear 

14.59 

13.13 

10.26 

9.00 

4.50 

12.32 

7.42 

2.54 

D2 

linear 

17.88 

17.88 

16.65 

1.44 

0.83 

20.49 

1.20 

0.55 

nonlinear 

164.39 

151.57 

138.15 

106.15 

47.66 

125.23 

76.95 

44.44 

D3 

linear 

693.34 

345.10 

226.65 

197.68 

124.81 

2012.49 

563.09 

322.47 

nonlinear 

9018.88 

5805.36 

4772.59 

1898.11 

1352.38 

8495.91 

1184.83 

745.02 

D4 

linear 

6132.45 

5536.21 

4121.21 

353.21 

93.67 

12027.28 

663.19 

187.13 

nonlinear 

168806.92 

139810.43 

122264.81 

46166.76 

23032.34 

143660.82 

35676.66 

14920.24 


Table 15: Computation time [sec] of model selection based on LOOCV (with tricks). 
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